Conversation
27baa73 to
bf988bc
Compare
90516e1 to
c85af30
Compare
39dca9c to
51f681b
Compare
|
run-ci: all |
|
@GNiendorf I like the progress here very much so far. |
|
please check the µ cube sample. |
b70bd1c to
be79ee2
Compare
|
run-ci: all |
|
The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.
The full set of validation and comparison plots can be found here. Here is a timing comparison: |
|
The PR was built and ran successfully with CMSSW running on CPU. Here are some plots. OOTB All Tracks
The full set of validation and comparison plots can be found here. |
be79ee2 to
27b2ca2
Compare
Just checked and I there are no differences in the plots on the cube 50 sample. I will start going through this PR more carefully now and double check things before I open it for review, may take a while. |
|
Manos suggested to break this PR up, and I think that makes sense. Going to open the first one shortly. |
| alpaka::math::asin( | ||
| acc, alpaka::math::min(acc, sdOut_dr * k2Rinv1GeVf / alpaka::math::abs(acc, pt_beta), kSinAlphaMax)), | ||
| betaOut); | ||
| acc, alpaka::math::min(acc, sdOut_dr * k2Rinv1GeVf / alpaka::math::abs(acc, pt_beta), kSinAlphaMax), betaOut); |
There was a problem hiding this comment.
please collect the non-technical changes, like these approximations in a separate collection of commits (or a separate PR).
For these asin replacements, perhaps it's better to ifdef or alias a function in the Common.h, e.g. approxAsin to then select the first order or the full approx at compile time.
Numerical reformulations (like the fastDeltaPhi) should be OK in the ~technical category.
There was a problem hiding this comment.
To elaborate on Gavin's message and explain it, my suggestion to him was exactly to factor out the technical changes from the ones that had an effect on the physics performance (I notice some performance changes on the validation plots), and to test (and potentially have) a change that may negatively affect the GPU timing also separately (at least on a separate commit).







just testing various random improvements/optimizations that seem to help CPU timing on lnx4555. Still very much a work in progress, code needs to be cleaned up and refined a bit.
This PR Timing (CPU)

Master Timing (CPU)
